Finding and identifying text in 900+ languages
نویسندگان
چکیده
منابع مشابه
Finding and Identifying Text in 900+ Languages
This paper presents a trainable open-source utility to extract text from arbitrary data files and disk images which uses language models to automatically detect character encodings prior to extracting strings and for automatic language identification and filtering of non-textual strings after extraction. With a test set containing 923 languages, consisting of strings of at most 65 characters, a...
متن کاملidentifying the strategies persian efl learners use in reading an expository text in english and examining its relation to reading-proficiency and motivation: a think-aloud study
هدف اصلی از این مطالعه بررسی نوع و میزان استراتژی هایی بود که دانشجویان فارسی زبان رشته ی زبان انگلیسی در حین خواندن یک متن انگلیسی به کار گرفتند. این مطالعه همچنین به بررسی تفاوت های استراتژی های مورد استفاده بین دارندگان سطح بالا و پایین درک مطلب پرداخت. نوع همبستگی بین استراتژی به کار گرفته و درک مطلب از یک سو و استراتژی به کار گرفته و انگیزه از سوی دیگر نیز در این تحقیق مورد آزمایش قرار گرف...
15 صفحه اولFinding Contradictions in Text
In this paper, I seek to understand the ways contradictions occur across texts and I describe a system for automatically detecting such constructions. Finding conflicting statements is foundational for text understanding, a problem which recently received a surge of interest in the computational linguistics community. Condoravdi et al. (2003) first recognized the importance of handling both ent...
متن کاملKeynote Lecture 2: Text Analysis for identifying Entities and their mentions in Indian languages
The talk deals with the analysis of text at syntactic-semantic level to identify a common feature set which can work across various Indian languages for recognizing named entities and their mentions. The development of corpora and the method adopted to develop each module is discussed. The talk includes the evaluation of the common feature set using a statistical method which gives acceptable l...
متن کاملFinding Sequences in Pattern Languages
This focus group was an experiment. We wanted to know what went on in people’s minds when they constructed sequences from an existing pattern language; i.e. how they selected patterns when attempting to solve a particular design problem. In this case we used the WU pattern language, which focuses on web usability (Graham, 2003). Ian Graham presented it with four of its patterns at EuroPLoP 2002...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Digital Investigation
سال: 2012
ISSN: 1742-2876
DOI: 10.1016/j.diin.2012.05.004